Siamese Neural Networks with Random Forest for detecting duplicate question pairs
نویسندگان
چکیده
Determining whether two given questions are semantically similar is a fairly challenging task given the different structures and forms that the questions can take. In this paper, we use Gated Recurrent Units(GRU) in combination with other highly used machine learning algorithms like Random Forest, Adaboost and SVM for the similarity prediction task on a dataset released by Quora, consisting of about 400k labeled question pairs. We got the best result by using the Siamese adaptation of a Bidirectional GRU with a Random Forest classifier, which landed us among the top 24% in the competition Quora Question Pairs hosted on Kaggle.
منابع مشابه
Duplicate Question Pair Detection with Deep Learning
Determining whether two questions are asking the same thing can be challenging, as word choice and sentence structure can vary significantly. Traditional natural language processing techniques such as shingling have been found to have limited success in separating related question from duplicate questions. Using a dataset of 400,000 labeled question pairs provided by question-and-answer forum Q...
متن کاملA Siamese Deep Forest
A Siamese Deep Forest (SDF) is proposed in the paper. It is based on the Deep Forest or gcForest proposed by Zhou and Feng and can be viewed as a gcForest modification. It can be also regarded as an alternative to the well-known Siamese neural networks. The SDF uses a modified training set consisting of concatenated pairs of vectors. Moreover, it defines the class distributions in the deep fore...
متن کاملTogether we stand: Siamese Networks for Similar Question Retrieval
Community Question Answering (cQA) services like Yahoo! Answers1, Baidu Zhidao2, Quora3, StackOverflow4 etc. provide a platform for interaction with experts and help users to obtain precise and accurate answers to their questions. The time lag between the user posting a question and receiving its answer could be reduced by retrieving similar historic questions from the cQA archives. The main ch...
متن کاملTown trip forecasting based on data mining techniques
In this paper, a data mining approach is proposed for duration prediction of the town trips (travel time) in New York City. In this regard, at first, two novel approaches, including a mathematical and a statistical approach, are proposed for grouping categorical variables with a huge number of levels. The proposed approaches work based on the cost matrix generated by repetitive post-hoc tests f...
متن کاملSiamese Instance Search for Tracking - Supplementary Material
To learn the matching function that operates on pairs of data, we use a Siamese architecture with two branches [1, 2]. The Siamese network processes the two inputs separately through individual networks that take the form of a convolutional neural network. For individual branches, we investigate two different network architectures, a small one adapted from AlexNet [5] (Figure 1a) and a very dee...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1801.07288 شماره
صفحات -
تاریخ انتشار 2018